Overview

Dataset statistics

Number of variables30
Number of observations990
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory239.8 KiB
Average record size in memory248.0 B

Variable types

Numeric6
Categorical24

Alerts

ECOG is highly overall correlated with ECOG_encodedHigh correlation
ECOG_encoded is highly overall correlated with ECOGHigh correlation
OS is highly overall correlated with PFS and 3 other fieldsHigh correlation
PFS is highly overall correlated with OS and 6 other fieldsHigh correlation
age is highly overall correlated with age_encodedHigh correlation
age_encoded is highly overall correlated with ageHigh correlation
best_response is highly overall correlated with PFS and 5 other fieldsHigh correlation
best_response_encoded is highly overall correlated with PFS and 5 other fieldsHigh correlation
clin_benefit=Yes is highly overall correlated with OS and 6 other fieldsHigh correlation
msi_type is highly overall correlated with msi_type_encodedHigh correlation
msi_type_encoded is highly overall correlated with msi_typeHigh correlation
progression=Yes is highly overall correlated with PFS and 5 other fieldsHigh correlation
response=Yes is highly overall correlated with OS and 5 other fieldsHigh correlation
stage is highly overall correlated with stage_encodedHigh correlation
stage_encoded is highly overall correlated with stageHigh correlation
tx_line is highly overall correlated with tx_line_encodedHigh correlation
tx_line_encoded is highly overall correlated with tx_lineHigh correlation
vital_status is highly overall correlated with OS and 5 other fieldsHigh correlation
msi_type is highly imbalanced (71.8%)Imbalance
stage is highly imbalanced (81.9%)Imbalance
cancer_type=LGI is highly imbalanced (67.4%)Imbalance
drug_class=ctla-4 is highly imbalanced (95.4%)Imbalance
stage_encoded is highly imbalanced (81.9%)Imbalance
msi_type_encoded is highly imbalanced (71.8%)Imbalance
id has unique valuesUnique
tmb_mutations_mb has 22 (2.2%) zerosZeros
age_encoded has 16 (1.6%) zerosZeros

Reproduction

Analysis started2024-03-10 19:35:27.520662
Analysis finished2024-03-10 19:35:30.493799
Duration2.97 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

id
Real number (ℝ)

UNIQUE 

Distinct990
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9153.2323
Minimum8215
Maximum10289
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 KiB
2024-03-10T12:35:30.532320image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum8215
5-th percentile8298.45
Q18635.5
median9151.5
Q39670.75
95-th percentile9971.1
Maximum10289
Range2074
Interquartile range (IQR)1035.25

Descriptive statistics

Standard deviation569.91101
Coefficient of variation (CV)0.062263361
Kurtosis-1.3572457
Mean9153.2323
Median Absolute Deviation (MAD)519
Skewness-0.017417286
Sum9061700
Variance324798.56
MonotonicityNot monotonic
2024-03-10T12:35:30.577239image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8215 1
 
0.1%
9510 1
 
0.1%
9514 1
 
0.1%
9517 1
 
0.1%
9518 1
 
0.1%
9519 1
 
0.1%
9522 1
 
0.1%
9523 1
 
0.1%
9526 1
 
0.1%
9527 1
 
0.1%
Other values (980) 980
99.0%
ValueCountFrequency (%)
8215 1
0.1%
8216 1
0.1%
8217 1
0.1%
8219 1
0.1%
8221 1
0.1%
8222 1
0.1%
8223 1
0.1%
8226 1
0.1%
8229 1
0.1%
8230 1
0.1%
ValueCountFrequency (%)
10289 1
0.1%
10276 1
0.1%
10255 1
0.1%
10254 1
0.1%
10251 1
0.1%
10246 1
0.1%
10245 1
0.1%
10239 1
0.1%
10238 1
0.1%
10032 1
0.1%

tx_year
Categorical

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
2017
356 
2018
353 
2016
210 
2015
71 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters3960
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2018
2nd row2015
3rd row2018
4th row2017
5th row2016

Common Values

ValueCountFrequency (%)
2017 356
36.0%
2018 353
35.7%
2016 210
21.2%
2015 71
 
7.2%

Length

2024-03-10T12:35:30.618318image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:30.663659image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
2017 356
36.0%
2018 353
35.7%
2016 210
21.2%
2015 71
 
7.2%

Most occurring characters

ValueCountFrequency (%)
2 990
25.0%
0 990
25.0%
1 990
25.0%
7 356
 
9.0%
8 353
 
8.9%
6 210
 
5.3%
5 71
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3960
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 990
25.0%
0 990
25.0%
1 990
25.0%
7 356
 
9.0%
8 353
 
8.9%
6 210
 
5.3%
5 71
 
1.8%

Most occurring scripts

ValueCountFrequency (%)
Common 3960
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 990
25.0%
0 990
25.0%
1 990
25.0%
7 356
 
9.0%
8 353
 
8.9%
6 210
 
5.3%
5 71
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3960
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 990
25.0%
0 990
25.0%
1 990
25.0%
7 356
 
9.0%
8 353
 
8.9%
6 210
 
5.3%
5 71
 
1.8%

age
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
61 - 70
346 
51 - 60
241 
71 - 95
230 
41 - 50
109 
31 - 40
48 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters6930
Distinct characters11
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row61 - 70
2nd row51 - 60
3rd row51 - 60
4th row51 - 60
5th row61 - 70

Common Values

ValueCountFrequency (%)
61 - 70 346
34.9%
51 - 60 241
24.3%
71 - 95 230
23.2%
41 - 50 109
 
11.0%
31 - 40 48
 
4.8%
21 - 30 16
 
1.6%

Length

2024-03-10T12:35:30.698712image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:30.738958image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
990
33.3%
61 346
 
11.6%
70 346
 
11.6%
51 241
 
8.1%
60 241
 
8.1%
71 230
 
7.7%
95 230
 
7.7%
41 109
 
3.7%
50 109
 
3.7%
31 48
 
1.6%
Other values (3) 80
 
2.7%

Most occurring characters

ValueCountFrequency (%)
1980
28.6%
1 990
14.3%
- 990
14.3%
0 760
 
11.0%
6 587
 
8.5%
5 580
 
8.4%
7 576
 
8.3%
9 230
 
3.3%
4 157
 
2.3%
3 64
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3960
57.1%
Space Separator 1980
28.6%
Dash Punctuation 990
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 990
25.0%
0 760
19.2%
6 587
14.8%
5 580
14.6%
7 576
14.5%
9 230
 
5.8%
4 157
 
4.0%
3 64
 
1.6%
2 16
 
0.4%
Space Separator
ValueCountFrequency (%)
1980
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 6930
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1980
28.6%
1 990
14.3%
- 990
14.3%
0 760
 
11.0%
6 587
 
8.5%
5 580
 
8.4%
7 576
 
8.3%
9 230
 
3.3%
4 157
 
2.3%
3 64
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6930
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1980
28.6%
1 990
14.3%
- 990
14.3%
0 760
 
11.0%
6 587
 
8.5%
5 580
 
8.4%
7 576
 
8.3%
9 230
 
3.3%
4 157
 
2.3%
3 64
 
0.9%

nlr
Real number (ℝ)

Distinct520
Distinct (%)52.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.4006869
Minimum0.3
Maximum87
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 KiB
2024-03-10T12:35:30.783723image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0.3
5-th percentile1.53
Q12.86
median4.42
Q37.13
95-th percentile16.714
Maximum87
Range86.7
Interquartile range (IQR)4.27

Descriptive statistics

Standard deviation7.0841867
Coefficient of variation (CV)1.1067854
Kurtosis36.310545
Mean6.4006869
Median Absolute Deviation (MAD)1.825
Skewness4.8687374
Sum6336.68
Variance50.185701
MonotonicityNot monotonic
2024-03-10T12:35:30.829090image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4 16
 
1.6%
2 12
 
1.2%
7 10
 
1.0%
3 10
 
1.0%
6 9
 
0.9%
5 8
 
0.8%
5.75 8
 
0.8%
6.5 7
 
0.7%
3.67 7
 
0.7%
2.63 6
 
0.6%
Other values (510) 897
90.6%
ValueCountFrequency (%)
0.3 1
0.1%
0.47 1
0.1%
0.65 1
0.1%
0.71 1
0.1%
0.74 1
0.1%
0.77 1
0.1%
0.79 1
0.1%
0.8 1
0.1%
0.95 1
0.1%
0.98 1
0.1%
ValueCountFrequency (%)
87 1
0.1%
77.33 1
0.1%
52 1
0.1%
51.75 1
0.1%
49.71 1
0.1%
49.5 1
0.1%
49.25 1
0.1%
43.67 1
0.1%
43.5 1
0.1%
37.5 1
0.1%

msi_type
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
Stable
918 
Indeterminate
 
41
Unstable
 
31

Length

Max length13
Median length6
Mean length6.3525253
Min length6

Characters and Unicode

Total characters6289
Distinct characters14
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowStable
2nd rowStable
3rd rowStable
4th rowStable
5th rowIndeterminate

Common Values

ValueCountFrequency (%)
Stable 918
92.7%
Indeterminate 41
 
4.1%
Unstable 31
 
3.1%

Length

2024-03-10T12:35:30.871695image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:30.911193image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
stable 918
92.7%
indeterminate 41
 
4.1%
unstable 31
 
3.1%

Most occurring characters

ValueCountFrequency (%)
e 1072
17.0%
t 1031
16.4%
a 990
15.7%
b 949
15.1%
l 949
15.1%
S 918
14.6%
n 113
 
1.8%
I 41
 
0.7%
d 41
 
0.7%
r 41
 
0.7%
Other values (4) 144
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5299
84.3%
Uppercase Letter 990
 
15.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1072
20.2%
t 1031
19.5%
a 990
18.7%
b 949
17.9%
l 949
17.9%
n 113
 
2.1%
d 41
 
0.8%
r 41
 
0.8%
m 41
 
0.8%
i 41
 
0.8%
Uppercase Letter
ValueCountFrequency (%)
S 918
92.7%
I 41
 
4.1%
U 31
 
3.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 6289
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1072
17.0%
t 1031
16.4%
a 990
15.7%
b 949
15.1%
l 949
15.1%
S 918
14.6%
n 113
 
1.8%
I 41
 
0.7%
d 41
 
0.7%
r 41
 
0.7%
Other values (4) 144
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6289
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1072
17.0%
t 1031
16.4%
a 990
15.7%
b 949
15.1%
l 949
15.1%
S 918
14.6%
n 113
 
1.8%
I 41
 
0.7%
d 41
 
0.7%
r 41
 
0.7%
Other values (4) 144
 
2.3%

tmb_mutations_mb
Real number (ℝ)

ZEROS 

Distinct112
Distinct (%)11.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.735758
Minimum0
Maximum368.6
Zeros22
Zeros (%)2.2%
Negative0
Negative (%)0.0%
Memory size15.5 KiB
2024-03-10T12:35:30.948186image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q13.5
median6.1
Q311.8
95-th percentile43.3
Maximum368.6
Range368.6
Interquartile range (IQR)8.3

Descriptive statistics

Standard deviation20.649885
Coefficient of variation (CV)1.7595698
Kurtosis103.20754
Mean11.735758
Median Absolute Deviation (MAD)3.5
Skewness7.8953863
Sum11618.4
Variance426.41775
MonotonicityNot monotonic
2024-03-10T12:35:30.994062image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.4 62
 
6.3%
3.5 57
 
5.8%
2.6 49
 
4.9%
5.3 48
 
4.8%
7.9 48
 
4.8%
6.1 46
 
4.6%
3.9 38
 
3.8%
1.8 36
 
3.6%
3 35
 
3.5%
2 30
 
3.0%
Other values (102) 541
54.6%
ValueCountFrequency (%)
0 22
 
2.2%
0.9 27
2.7%
1 18
 
1.8%
1.8 36
3.6%
2 30
3.0%
2.6 49
4.9%
3 35
3.5%
3.5 57
5.8%
3.9 38
3.8%
4.4 62
6.3%
ValueCountFrequency (%)
368.6 1
0.1%
178.2 1
0.1%
158.9 1
0.1%
153.5 1
0.1%
144.8 1
0.1%
131.7 1
0.1%
111.5 1
0.1%
102.7 1
0.1%
102.3 1
0.1%
93.5 1
0.1%

best_response
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
PD
519 
PR
212 
SD
199 
CR
60 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1980
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPR
2nd rowPD
3rd rowPR
4th rowPR
5th rowPD

Common Values

ValueCountFrequency (%)
PD 519
52.4%
PR 212
21.4%
SD 199
 
20.1%
CR 60
 
6.1%

Length

2024-03-10T12:35:31.037034image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:31.074593image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
pd 519
52.4%
pr 212
21.4%
sd 199
 
20.1%
cr 60
 
6.1%

Most occurring characters

ValueCountFrequency (%)
P 731
36.9%
D 718
36.3%
R 272
 
13.7%
S 199
 
10.1%
C 60
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1980
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P 731
36.9%
D 718
36.3%
R 272
 
13.7%
S 199
 
10.1%
C 60
 
3.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1980
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
P 731
36.9%
D 718
36.3%
R 272
 
13.7%
S 199
 
10.1%
C 60
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1980
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
P 731
36.9%
D 718
36.3%
R 272
 
13.7%
S 199
 
10.1%
C 60
 
3.0%

PFS
Real number (ℝ)

HIGH CORRELATION 

Distinct395
Distinct (%)39.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.7940202
Minimum0.1
Maximum52.44
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 KiB
2024-03-10T12:35:31.113220image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0.1
5-th percentile0.59
Q11.3575
median2.6
Q38.215
95-th percentile27.37
Maximum52.44
Range52.34
Interquartile range (IQR)6.8575

Descriptive statistics

Standard deviation9.3614444
Coefficient of variation (CV)1.3778947
Kurtosis5.7203496
Mean6.7940202
Median Absolute Deviation (MAD)1.71
Skewness2.3525151
Sum6726.08
Variance87.636642
MonotonicityNot monotonic
2024-03-10T12:35:31.160087image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.35 18
 
1.8%
1.61 15
 
1.5%
1.38 15
 
1.5%
1.15 14
 
1.4%
1.68 14
 
1.4%
1.41 13
 
1.3%
0.69 12
 
1.2%
0.89 12
 
1.2%
1.18 12
 
1.2%
0.92 12
 
1.2%
Other values (385) 853
86.2%
ValueCountFrequency (%)
0.1 1
 
0.1%
0.13 2
 
0.2%
0.2 4
0.4%
0.23 5
0.5%
0.26 3
0.3%
0.3 3
0.3%
0.33 1
 
0.1%
0.36 3
0.3%
0.39 3
0.3%
0.43 2
 
0.2%
ValueCountFrequency (%)
52.44 1
0.1%
51.75 1
0.1%
50.23 1
0.1%
49.97 1
0.1%
46.88 1
0.1%
46.69 1
0.1%
46.46 1
0.1%
46.36 1
0.1%
45.96 1
0.1%
45.54 1
0.1%

vital_status
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
1.0
610 
0.0
380 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 610
61.6%
0.0 380
38.4%

Length

2024-03-10T12:35:31.201491image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:31.237089image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0 610
61.6%
0.0 380
38.4%

Most occurring characters

ValueCountFrequency (%)
0 1370
46.1%
. 990
33.3%
1 610
20.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1370
69.2%
1 610
30.8%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1370
46.1%
. 990
33.3%
1 610
20.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1370
46.1%
. 990
33.3%
1 610
20.5%

OS
Real number (ℝ)

HIGH CORRELATION 

Distinct606
Distinct (%)61.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.378475
Minimum0.1
Maximum52.44
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 KiB
2024-03-10T12:35:31.273441image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0.1
5-th percentile0.89
Q14.1175
median10.775
Q319.55
95-th percentile36.815
Maximum52.44
Range52.34
Interquartile range (IQR)15.4325

Descriptive statistics

Standard deviation11.1894
Coefficient of variation (CV)0.83637336
Kurtosis0.73721163
Mean13.378475
Median Absolute Deviation (MAD)7.295
Skewness1.0731896
Sum13244.69
Variance125.20267
MonotonicityNot monotonic
2024-03-10T12:35:31.323670image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.88 6
 
0.6%
2.69 5
 
0.5%
1.74 5
 
0.5%
7.95 5
 
0.5%
0.95 5
 
0.5%
0.99 5
 
0.5%
14.95 5
 
0.5%
3.35 4
 
0.4%
0.92 4
 
0.4%
0.85 4
 
0.4%
Other values (596) 942
95.2%
ValueCountFrequency (%)
0.1 1
 
0.1%
0.13 2
0.2%
0.2 1
 
0.1%
0.23 3
0.3%
0.3 1
 
0.1%
0.33 1
 
0.1%
0.36 3
0.3%
0.39 2
0.2%
0.43 1
 
0.1%
0.46 2
0.2%
ValueCountFrequency (%)
52.44 1
0.1%
52.04 1
0.1%
51.75 1
0.1%
50.14 1
0.1%
50.07 1
0.1%
49.97 1
0.1%
49.91 1
0.1%
48.82 1
0.1%
48.03 1
0.1%
47.47 1
0.1%

stage
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
IV
930 
III
 
54
II
 
5
I
 
1

Length

Max length3
Median length2
Mean length2.0535354
Min length1

Characters and Unicode

Total characters2033
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowIV
2nd rowIV
3rd rowIV
4th rowIV
5th rowIV

Common Values

ValueCountFrequency (%)
IV 930
93.9%
III 54
 
5.5%
II 5
 
0.5%
I 1
 
0.1%

Length

2024-03-10T12:35:31.371543image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:31.538184image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
iv 930
93.9%
iii 54
 
5.5%
ii 5
 
0.5%
i 1
 
0.1%

Most occurring characters

ValueCountFrequency (%)
I 1103
54.3%
V 930
45.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2033
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I 1103
54.3%
V 930
45.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 2033
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
I 1103
54.3%
V 930
45.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2033
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 1103
54.3%
V 930
45.7%

tx_line
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
Subsequent-line
667 
First-line
323 

Length

Max length15
Median length15
Mean length13.368687
Min length10

Characters and Unicode

Total characters13235
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSubsequent-line
2nd rowFirst-line
3rd rowSubsequent-line
4th rowFirst-line
5th rowSubsequent-line

Common Values

ValueCountFrequency (%)
Subsequent-line 667
67.4%
First-line 323
32.6%

Length

2024-03-10T12:35:31.585948image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:31.644606image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
subsequent-line 667
67.4%
first-line 323
32.6%

Most occurring characters

ValueCountFrequency (%)
e 2324
17.6%
n 1657
12.5%
u 1334
10.1%
i 1313
9.9%
s 990
7.5%
t 990
7.5%
- 990
7.5%
l 990
7.5%
S 667
 
5.0%
b 667
 
5.0%
Other values (3) 1313
9.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11255
85.0%
Dash Punctuation 990
 
7.5%
Uppercase Letter 990
 
7.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2324
20.6%
n 1657
14.7%
u 1334
11.9%
i 1313
11.7%
s 990
8.8%
t 990
8.8%
l 990
8.8%
b 667
 
5.9%
q 667
 
5.9%
r 323
 
2.9%
Uppercase Letter
ValueCountFrequency (%)
S 667
67.4%
F 323
32.6%
Dash Punctuation
ValueCountFrequency (%)
- 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12245
92.5%
Common 990
 
7.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2324
19.0%
n 1657
13.5%
u 1334
10.9%
i 1313
10.7%
s 990
8.1%
t 990
8.1%
l 990
8.1%
S 667
 
5.4%
b 667
 
5.4%
q 667
 
5.4%
Other values (2) 646
 
5.3%
Common
ValueCountFrequency (%)
- 990
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13235
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 2324
17.6%
n 1657
12.5%
u 1334
10.1%
i 1313
9.9%
s 990
7.5%
t 990
7.5%
- 990
7.5%
l 990
7.5%
S 667
 
5.0%
b 667
 
5.0%
Other values (3) 1313
9.9%

ECOG
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
1.0
544 
0.0
357 
2.0
76 
3.0
 
11
4.0
 
2

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 544
54.9%
0.0 357
36.1%
2.0 76
 
7.7%
3.0 11
 
1.1%
4.0 2
 
0.2%

Length

2024-03-10T12:35:31.680665image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:31.724914image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0 544
54.9%
0.0 357
36.1%
2.0 76
 
7.7%
3.0 11
 
1.1%
4.0 2
 
0.2%

Most occurring characters

ValueCountFrequency (%)
0 1347
45.4%
. 990
33.3%
1 544
18.3%
2 76
 
2.6%
3 11
 
0.4%
4 2
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1347
68.0%
1 544
27.5%
2 76
 
3.8%
3 11
 
0.6%
4 2
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1347
45.4%
. 990
33.3%
1 544
18.3%
2 76
 
2.6%
3 11
 
0.4%
4 2
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1347
45.4%
. 990
33.3%
1 544
18.3%
2 76
 
2.6%
3 11
 
0.4%
4 2
 
0.1%

response=Yes
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
0.0
718 
1.0
272 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row1.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 718
72.5%
1.0 272
 
27.5%

Length

2024-03-10T12:35:31.762829image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:31.800723image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0 718
72.5%
1.0 272
 
27.5%

Most occurring characters

ValueCountFrequency (%)
0 1708
57.5%
. 990
33.3%
1 272
 
9.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1708
86.3%
1 272
 
13.7%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1708
57.5%
. 990
33.3%
1 272
 
9.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1708
57.5%
. 990
33.3%
1 272
 
9.2%

clin_benefit=Yes
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
0.0
674 
1.0
316 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row1.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 674
68.1%
1.0 316
31.9%

Length

2024-03-10T12:35:31.831844image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:31.867397image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0 674
68.1%
1.0 316
31.9%

Most occurring characters

ValueCountFrequency (%)
0 1664
56.0%
. 990
33.3%
1 316
 
10.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1664
84.0%
1 316
 
16.0%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1664
56.0%
. 990
33.3%
1 316
 
10.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1664
56.0%
. 990
33.3%
1 316
 
10.6%

cancer_type=GU
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
0.0
851 
1.0
139 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 851
86.0%
1.0 139
 
14.0%

Length

2024-03-10T12:35:31.898161image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:31.933754image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0 851
86.0%
1.0 139
 
14.0%

Most occurring characters

ValueCountFrequency (%)
0 1841
62.0%
. 990
33.3%
1 139
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1841
93.0%
1 139
 
7.0%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1841
62.0%
. 990
33.3%
1 139
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1841
62.0%
. 990
33.3%
1 139
 
4.7%

cancer_type=LGI
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
0.0
931 
1.0
 
59

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 931
94.0%
1.0 59
 
6.0%

Length

2024-03-10T12:35:31.964798image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:31.999596image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0 931
94.0%
1.0 59
 
6.0%

Most occurring characters

ValueCountFrequency (%)
0 1921
64.7%
. 990
33.3%
1 59
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1921
97.0%
1 59
 
3.0%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1921
64.7%
. 990
33.3%
1 59
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1921
64.7%
. 990
33.3%
1 59
 
2.0%

cancer_type=Lung
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
0.0
534 
1.0
456 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 534
53.9%
1.0 456
46.1%

Length

2024-03-10T12:35:32.028902image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:32.064243image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0 534
53.9%
1.0 456
46.1%

Most occurring characters

ValueCountFrequency (%)
0 1524
51.3%
. 990
33.3%
1 456
 
15.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1524
77.0%
1 456
 
23.0%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1524
51.3%
. 990
33.3%
1 456
 
15.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1524
51.3%
. 990
33.3%
1 456
 
15.4%
Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
0.0
853 
1.0
137 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 853
86.2%
1.0 137
 
13.8%

Length

2024-03-10T12:35:32.094521image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:32.129978image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0 853
86.2%
1.0 137
 
13.8%

Most occurring characters

ValueCountFrequency (%)
0 1843
62.1%
. 990
33.3%
1 137
 
4.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1843
93.1%
1 137
 
6.9%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1843
62.1%
. 990
33.3%
1 137
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1843
62.1%
. 990
33.3%
1 137
 
4.6%

cancer_type=UGI
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
0.0
810 
1.0
180 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
0.0 810
81.8%
1.0 180
 
18.2%

Length

2024-03-10T12:35:32.160140image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:32.195866image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0 810
81.8%
1.0 180
 
18.2%

Most occurring characters

ValueCountFrequency (%)
0 1800
60.6%
. 990
33.3%
1 180
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1800
90.9%
1 180
 
9.1%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1800
60.6%
. 990
33.3%
1 180
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1800
60.6%
. 990
33.3%
1 180
 
6.1%

sex=Male
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
1.0
568 
0.0
422 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row1.0
5th row0.0

Common Values

ValueCountFrequency (%)
1.0 568
57.4%
0.0 422
42.6%

Length

2024-03-10T12:35:32.227104image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:32.262834image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0 568
57.4%
0.0 422
42.6%

Most occurring characters

ValueCountFrequency (%)
0 1412
47.5%
. 990
33.3%
1 568
19.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1412
71.3%
1 568
28.7%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1412
47.5%
. 990
33.3%
1 568
19.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1412
47.5%
. 990
33.3%
1 568
19.1%

drug_class=ctla-4
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
0.0
985 
1.0
 
5

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 985
99.5%
1.0 5
 
0.5%

Length

2024-03-10T12:35:32.298433image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:32.334642image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0 985
99.5%
1.0 5
 
0.5%

Most occurring characters

ValueCountFrequency (%)
0 1975
66.5%
. 990
33.3%
1 5
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1975
99.7%
1 5
 
0.3%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1975
66.5%
. 990
33.3%
1 5
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1975
66.5%
. 990
33.3%
1 5
 
0.2%
Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
1.0
807 
0.0
183 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 807
81.5%
0.0 183
 
18.5%

Length

2024-03-10T12:35:32.366698image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:32.404455image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0 807
81.5%
0.0 183
 
18.5%

Most occurring characters

ValueCountFrequency (%)
0 1173
39.5%
. 990
33.3%
1 807
27.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1173
59.2%
1 807
40.8%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1173
39.5%
. 990
33.3%
1 807
27.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1173
39.5%
. 990
33.3%
1 807
27.2%

progression=Yes
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
1.0
835 
0.0
155 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2970
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 835
84.3%
0.0 155
 
15.7%

Length

2024-03-10T12:35:32.436152image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:32.471822image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0 835
84.3%
0.0 155
 
15.7%

Most occurring characters

ValueCountFrequency (%)
0 1145
38.6%
. 990
33.3%
1 835
28.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1980
66.7%
Other Punctuation 990
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1145
57.8%
1 835
42.2%
Other Punctuation
ValueCountFrequency (%)
. 990
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2970
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1145
38.6%
. 990
33.3%
1 835
28.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2970
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1145
38.6%
. 990
33.3%
1 835
28.1%

age_encoded
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct6
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.5585859
Minimum0
Maximum5
Zeros16
Zeros (%)1.6%
Negative0
Negative (%)0.0%
Memory size15.5 KiB
2024-03-10T12:35:32.497984image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q13
median4
Q34
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.1906707
Coefficient of variation (CV)0.33459097
Kurtosis0.17730439
Mean3.5585859
Median Absolute Deviation (MAD)1
Skewness-0.75694884
Sum3523
Variance1.4176967
MonotonicityNot monotonic
2024-03-10T12:35:32.532275image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
4 346
34.9%
3 241
24.3%
5 230
23.2%
2 109
 
11.0%
1 48
 
4.8%
0 16
 
1.6%
ValueCountFrequency (%)
0 16
 
1.6%
1 48
 
4.8%
2 109
 
11.0%
3 241
24.3%
4 346
34.9%
5 230
23.2%
ValueCountFrequency (%)
5 230
23.2%
4 346
34.9%
3 241
24.3%
2 109
 
11.0%
1 48
 
4.8%
0 16
 
1.6%

best_response_encoded
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
1
519 
2
212 
3
199 
0
60 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters990
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
1 519
52.4%
2 212
21.4%
3 199
 
20.1%
0 60
 
6.1%

Length

2024-03-10T12:35:32.568414image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:32.606294image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1 519
52.4%
2 212
21.4%
3 199
 
20.1%
0 60
 
6.1%

Most occurring characters

ValueCountFrequency (%)
1 519
52.4%
2 212
21.4%
3 199
 
20.1%
0 60
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 990
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 519
52.4%
2 212
21.4%
3 199
 
20.1%
0 60
 
6.1%

Most occurring scripts

ValueCountFrequency (%)
Common 990
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 519
52.4%
2 212
21.4%
3 199
 
20.1%
0 60
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 990
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 519
52.4%
2 212
21.4%
3 199
 
20.1%
0 60
 
6.1%

stage_encoded
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
3
930 
2
 
54
1
 
5
0
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters990
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row3
2nd row3
3rd row3
4th row3
5th row3

Common Values

ValueCountFrequency (%)
3 930
93.9%
2 54
 
5.5%
1 5
 
0.5%
0 1
 
0.1%

Length

2024-03-10T12:35:32.642344image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:32.679469image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
3 930
93.9%
2 54
 
5.5%
1 5
 
0.5%
0 1
 
0.1%

Most occurring characters

ValueCountFrequency (%)
3 930
93.9%
2 54
 
5.5%
1 5
 
0.5%
0 1
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 990
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 930
93.9%
2 54
 
5.5%
1 5
 
0.5%
0 1
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 990
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 930
93.9%
2 54
 
5.5%
1 5
 
0.5%
0 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 990
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 930
93.9%
2 54
 
5.5%
1 5
 
0.5%
0 1
 
0.1%

ECOG_encoded
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
1
544 
0
357 
2
76 
3
 
11
4
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters990
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
1 544
54.9%
0 357
36.1%
2 76
 
7.7%
3 11
 
1.1%
4 2
 
0.2%

Length

2024-03-10T12:35:32.711146image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:32.748909image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1 544
54.9%
0 357
36.1%
2 76
 
7.7%
3 11
 
1.1%
4 2
 
0.2%

Most occurring characters

ValueCountFrequency (%)
1 544
54.9%
0 357
36.1%
2 76
 
7.7%
3 11
 
1.1%
4 2
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 990
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 544
54.9%
0 357
36.1%
2 76
 
7.7%
3 11
 
1.1%
4 2
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common 990
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 544
54.9%
0 357
36.1%
2 76
 
7.7%
3 11
 
1.1%
4 2
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 990
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 544
54.9%
0 357
36.1%
2 76
 
7.7%
3 11
 
1.1%
4 2
 
0.2%

msi_type_encoded
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
1
918 
0
 
41
2
 
31

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters990
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
1 918
92.7%
0 41
 
4.1%
2 31
 
3.1%

Length

2024-03-10T12:35:32.783186image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:32.818916image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1 918
92.7%
0 41
 
4.1%
2 31
 
3.1%

Most occurring characters

ValueCountFrequency (%)
1 918
92.7%
0 41
 
4.1%
2 31
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 990
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 918
92.7%
0 41
 
4.1%
2 31
 
3.1%

Most occurring scripts

ValueCountFrequency (%)
Common 990
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 918
92.7%
0 41
 
4.1%
2 31
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 990
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 918
92.7%
0 41
 
4.1%
2 31
 
3.1%

tx_line_encoded
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
1
667 
0
323 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters990
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
1 667
67.4%
0 323
32.6%

Length

2024-03-10T12:35:32.849747image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-10T12:35:32.884506image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
1 667
67.4%
0 323
32.6%

Most occurring characters

ValueCountFrequency (%)
1 667
67.4%
0 323
32.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 990
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 667
67.4%
0 323
32.6%

Most occurring scripts

ValueCountFrequency (%)
Common 990
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 667
67.4%
0 323
32.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 990
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 667
67.4%
0 323
32.6%

Interactions

2024-03-10T12:35:30.007233image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:28.605148image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:28.981466image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.211260image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.458255image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.696359image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:30.044971image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:28.697290image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.017122image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.250110image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.496345image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.733002image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:30.082499image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:28.770839image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.053141image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.289665image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.533058image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.771249image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:30.125467image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:28.849612image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.094872image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.333814image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.579757image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.813045image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:30.164663image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:28.904110image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.133816image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.375730image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.618503image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.851980image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:30.206535image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:28.942718image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.172400image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.417319image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.657661image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2024-03-10T12:35:29.889903image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2024-03-10T12:35:32.924887image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ECOGECOG_encodedOSPFSageage_encodedbest_responsebest_response_encodedcancer_type=GUcancer_type=LGIcancer_type=Lungcancer_type=Melanomacancer_type=UGIclin_benefit=Yesdrug_class=ctla-4drug_class=pd-1/pd-l1idmsi_typemsi_type_encodednlrprogression=Yesresponse=Yessex=Malestagestage_encodedtmb_mutations_mbtx_linetx_line_encodedtx_yearvital_status
ECOG1.0001.000-0.372-0.2880.0670.1700.1460.1460.0000.0000.1610.2210.0560.2050.0000.0990.0350.0830.0830.2990.1900.2040.0420.0320.0320.0310.2370.2370.0690.269
ECOG_encoded1.0001.000-0.372-0.2880.0670.1700.1460.1460.0000.0000.1610.2210.0560.2050.0000.0990.0350.0830.0830.2990.1900.2040.0420.0320.0320.0310.2370.2370.0690.269
OS-0.372-0.3721.0000.7110.056-0.0480.3690.3690.1580.0000.0940.1900.1670.5980.0000.205-0.1200.0000.000-0.3550.4170.5390.0710.0570.0570.1420.3300.3300.4310.617
PFS-0.288-0.2880.7111.0000.0000.0200.5190.5190.1080.0000.0710.1720.0380.8610.0000.161-0.0150.0980.098-0.2580.7710.7600.0000.0680.0680.1960.3090.3090.3060.607
age0.0670.0670.0560.0001.0001.0000.0480.0480.0680.2030.1970.0490.0410.0000.0110.0770.0430.0570.0570.0650.0000.0180.0790.0000.0000.1680.0410.0410.0000.055
age_encoded0.1700.170-0.0480.0201.0001.0000.0480.0480.0680.2030.1970.0490.0410.0000.0110.0770.0430.0570.0570.0650.0000.0180.0790.0000.0000.1680.0410.0410.0000.055
best_response0.1460.1460.3690.5190.0480.0481.0001.0000.0000.0000.1480.3560.0690.9160.0000.1050.0360.0610.061-0.0270.6230.9990.0330.0880.0880.0010.3280.3280.0720.545
best_response_encoded0.1460.1460.3690.5190.0480.0481.0001.0000.0000.0000.1480.3560.0690.9160.0000.1050.0360.0610.061-0.0270.6230.9990.0330.0880.0880.0010.3280.3280.0720.545
cancer_type=GU0.0000.0000.1580.1080.0680.0680.0000.0001.0000.0900.3690.1550.1840.0000.0000.034-0.0960.0540.054-0.0460.0330.0000.0810.0910.091-0.1290.0000.0000.0930.000
cancer_type=LGI0.0000.0000.0000.0000.2030.2030.0000.0000.0901.0000.2260.0890.1090.0000.0000.000-0.0160.3720.372-0.0050.0000.0000.0330.0000.0000.1210.1020.1020.0700.000
cancer_type=Lung0.1610.1610.0940.0710.1970.1970.1480.1480.3690.2261.0000.3660.4320.0620.0410.0870.0770.1230.1230.1570.1120.0740.1190.0410.0410.0680.1400.1400.1220.093
cancer_type=Melanoma0.2210.2210.1900.1720.0490.0490.3560.3560.1550.0890.3661.0000.1820.1710.0000.2480.0200.0700.070-0.1460.2160.2000.0950.1810.1810.1970.4910.4910.1270.171
cancer_type=UGI0.0560.0560.1670.0380.0410.0410.0690.0690.1840.1090.4320.1821.0000.0310.0000.005-0.0120.0490.049-0.0130.0780.0260.0680.0000.000-0.2010.1260.1260.1640.054
clin_benefit=Yes0.2050.2050.5980.8610.0000.0000.9160.9160.0000.0000.0620.1710.0311.0000.0000.0840.0440.0920.092-0.1900.5840.8960.0000.0680.0680.1980.3190.3190.0910.553
drug_class=ctla-40.0000.0000.0000.0000.0110.0110.0000.0000.0000.0000.0410.0000.0000.0001.0000.127-0.0340.0000.0000.0040.0000.0000.0230.0000.0000.0080.0000.0000.0530.000
drug_class=pd-1/pd-l10.0990.0990.2050.1610.0770.0770.1050.1050.0340.0000.0870.2480.0050.0840.1271.0000.0630.0620.0620.0400.0790.1010.0000.0560.056-0.0120.1910.1910.1590.031
id0.0350.035-0.120-0.0150.0430.0430.0360.036-0.096-0.0160.0770.020-0.0120.044-0.0340.0631.0000.0380.0380.0260.0000.0000.0000.0350.035-0.0010.2510.2510.4760.077
msi_type0.0830.0830.0000.0980.0570.0570.0610.0610.0540.3720.1230.0700.0490.0920.0000.0620.0381.0001.000-0.0040.1250.0990.0160.0200.0200.1470.0890.0890.0000.033
msi_type_encoded0.0830.0830.0000.0980.0570.0570.0610.0610.0540.3720.1230.0700.0490.0920.0000.0620.0381.0001.000-0.0040.1250.0990.0160.0200.0200.1470.0890.0890.0000.033
nlr0.2990.299-0.355-0.2580.0650.065-0.027-0.027-0.046-0.0050.157-0.146-0.013-0.1900.0040.0400.026-0.004-0.0041.0000.0970.1230.0000.0000.0000.0330.1140.1140.0180.207
progression=Yes0.1900.1900.4170.7710.0000.0000.6230.6230.0330.0000.1120.2160.0780.5840.0000.0790.0000.1250.1250.0971.0000.5780.0360.1280.128-0.2330.2530.2530.0620.542
response=Yes0.2040.2040.5390.7600.0180.0180.9990.9990.0000.0000.0740.2000.0260.8960.0000.1010.0000.0990.0990.1230.5781.0000.0000.0660.0660.1910.3060.3060.1000.498
sex=Male0.0420.0420.0710.0000.0790.0790.0330.0330.0810.0330.1190.0950.0680.0000.0230.0000.0000.0160.0160.0000.0360.0001.0000.0260.0260.0540.1000.1000.0000.000
stage0.0320.0320.0570.0680.0000.0000.0880.0880.0910.0000.0410.1810.0000.0680.0000.0560.0350.0200.0200.0000.1280.0660.0261.0001.000-0.0800.1280.1280.0000.126
stage_encoded0.0320.0320.0570.0680.0000.0000.0880.0880.0910.0000.0410.1810.0000.0680.0000.0560.0350.0200.0200.0000.1280.0660.0261.0001.000-0.0800.1280.1280.0000.126
tmb_mutations_mb0.0310.0310.1420.1960.1680.1680.0010.001-0.1290.1210.0680.197-0.2010.1980.008-0.012-0.0010.1470.1470.033-0.2330.1910.054-0.080-0.0801.0000.1150.1150.0160.123
tx_line0.2370.2370.3300.3090.0410.0410.3280.3280.0000.1020.1400.4910.1260.3190.0000.1910.2510.0890.0890.1140.2530.3060.1000.1280.1280.1151.0000.9980.1770.226
tx_line_encoded0.2370.2370.3300.3090.0410.0410.3280.3280.0000.1020.1400.4910.1260.3190.0000.1910.2510.0890.0890.1140.2530.3060.1000.1280.1280.1150.9981.0000.1770.226
tx_year0.0690.0690.4310.3060.0000.0000.0720.0720.0930.0700.1220.1270.1640.0910.0530.1590.4760.0000.0000.0180.0620.1000.0000.0000.0000.0160.1770.1771.0000.103
vital_status0.2690.2690.6170.6070.0550.0550.5450.5450.0000.0000.0930.1710.0540.5530.0000.0310.0770.0330.0330.2070.5420.4980.0000.1260.1260.1230.2260.2260.1031.000

Missing values

2024-03-10T12:35:30.281853image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-10T12:35:30.430270image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

idtx_yearagenlrmsi_typetmb_mutations_mbbest_responsePFSvital_statusOSstagetx_lineECOGresponse=Yesclin_benefit=Yescancer_type=GUcancer_type=LGIcancer_type=Lungcancer_type=Melanomacancer_type=UGIsex=Maledrug_class=ctla-4drug_class=pd-1/pd-l1progression=Yesage_encodedbest_response_encodedstage_encodedECOG_encodedmsi_type_encodedtx_line_encoded
28215201861 - 701.38Stable19.3PR3.450.09.59IVSubsequent-line0.01.01.00.00.01.00.00.00.00.01.01.0423011
38216201551 - 602.69Stable1.0PD0.560.050.14IVFirst-line0.00.00.00.00.00.01.00.00.00.00.01.0313010
48217201851 - 602.54Stable10.5PR4.670.09.99IVSubsequent-line0.01.01.00.00.01.00.00.00.00.01.01.0323011
68219201751 - 605.21Stable0.0PR8.611.018.83IVFirst-line0.01.01.00.00.00.00.01.01.00.01.01.0323010
88221201661 - 702.18Indeterminate2.0PD1.251.02.33IVSubsequent-line1.00.00.00.00.00.00.01.00.00.01.01.0413101
98222201751 - 604.62Stable4.4PD1.971.03.12IVSubsequent-line1.00.00.01.00.00.00.00.01.00.01.01.0313111
118223201761 - 702.56Indeterminate41.3SD7.131.07.13IVSubsequent-line0.00.00.00.01.00.00.00.00.00.01.01.0433001
148226201861 - 704.89Stable3.5PD1.350.03.22IVSubsequent-line1.00.00.00.00.01.00.00.01.00.00.01.0413111
178229201551 - 602.92Stable19.7CR45.960.048.82IVSubsequent-line1.01.01.00.00.01.00.00.00.00.01.00.0303111
188230201761 - 704.25Stable8.8PD1.641.06.77IVSubsequent-line1.00.00.01.00.00.00.00.01.00.01.01.0413111
idtx_yearagenlrmsi_typetmb_mutations_mbbest_responsePFSvital_statusOSstagetx_lineECOGresponse=Yesclin_benefit=Yescancer_type=GUcancer_type=LGIcancer_type=Lungcancer_type=Melanomacancer_type=UGIsex=Maledrug_class=ctla-4drug_class=pd-1/pd-l1progression=Yesage_encodedbest_response_encodedstage_encodedECOG_encodedmsi_type_encodedtx_line_encoded
161710016201851 - 6012.30Stable2.6SD2.461.06.08IVFirst-line0.00.00.00.00.01.00.00.01.00.01.01.0333010
161810017201871 - 9519.88Unstable47.4SD1.741.01.74IVFirst-line1.00.00.00.00.01.00.00.00.00.01.01.0533120
162110020201861 - 709.45Stable17.6PD0.891.03.45IVFirst-line0.00.00.00.01.00.00.00.01.00.01.01.0413010
162210021201861 - 705.92Stable3.5CR10.910.013.83IVFirst-line1.01.01.00.00.00.00.01.01.00.01.01.0403110
162310023201861 - 701.86Stable7.9PR12.450.015.90IVFirst-line0.01.01.00.00.01.00.00.01.00.01.01.0423010
162410025201851 - 609.25Stable0.0PD0.821.00.82IVFirst-line1.00.00.00.00.01.00.00.01.00.01.01.0313110
162610027201861 - 706.50Stable12.3SD11.700.012.22IIISubsequent-line1.00.01.00.00.01.00.00.01.00.01.00.0432111
162710028201871 - 954.82Stable9.7PD1.541.01.54IVFirst-line1.00.00.00.00.01.00.00.01.00.01.01.0513110
162910030201851 - 604.92Stable3.5SD8.080.014.95IVFirst-line0.00.01.00.00.01.00.00.01.00.01.01.0333010
163010032201841 - 5034.67Stable9.7PD1.381.03.15IVSubsequent-line1.00.00.00.00.01.00.00.01.00.01.01.0213111